Large-scale genotyping and phenotyping of a worldwide winter wheat genebank for its use in pre-breeding

Plant genetic resources (PGR) stored at genebanks are humanity’s crop diversity savings for the future. Information on PGR contrasted with modern cultivars is key to select PGR parents for pre-breeding. Genotyping-by-sequencing was performed for 7,745 winter wheat PGR samples from the German Federal ex situ genebank at IPK Gatersleben and for 325 modern cultivars. Whole-genome shotgun sequencing was carried out for 446 diverse PGR samples and 322 modern cultivars and lines. In 19 field trials, 7,683 PGR and 232 elite cultivars were characterized for resistance to yellow rust - one of the major threats to wheat worldwide. Yield breeding values of 707 PGR were estimated using hybrid crosses with 36 cultivars - an approach that reduces the lack of agronomic adaptation of PGR and provides better estimates of their contribution to yield breeding. Cross-validations support the interoperability between genomic and phenotypic data. The here presented data are a stepping stone to unlock the functional variation of PGR for European pre-breeding and are the basis for future breeding and research activities.

Seed sources. PGR from the 'TRI' catalog were directly obtained from the IPK genebank through GBIS, while seeds of PGR from the 'B' collection were kindly facilitated by Dr. Andreas Börner. The IPK genebank can normally provide ~5 g of seeds per PGR. Thus, in order to fulfill seed amount needs of large-scale research activities, seeds of 9,135 PGR were multiplied in a first step using single-row propagation plots. In parallel, 173 of these PGR plus 10 additional ones were multiplied under greenhouse conditions under the frame of the GenDiv project 32 . For elite cultivars, seeds were obtained from the local market either recently or in previous projects 33, 34 . Seeds of the advanced breeding lines were provided by four breeding companies with base in Germany 35,36 .
For field-propagated PGR, one (two) representative ear(s) was (were) bagged in case of homogeneous (clearly non-homogenous) PGR. Following a single-seed descent (SSD) method, seeds from isolated ears were harvested separately from the rest of the plot and further propagated using an ear-to-row method. Hereafter, we refer to these PGR samples as SSD-PGR. Greenhouse-propagated PGR were fixed by applying the SSD method for two consecutive generations (2 × SSD) for each PGR 32 .
Genomic data processing. In a first step, reads of each of the 171 modern cultivars sequenced in triplicate were pooled according to each original genotype. In this regard and if the opposite is not stated, the terms samples and genotypes are used indistinctly when referring to genomic data throughout the whole manuscript. Low-quality bases and adapter sequences were discarded from GBS raw reads using cutadapt (v1. 16) 39 by considering a minimum read length of 30 bp. This step was subsequently confirmed by using FastQC (v0.11.7) 40 . After this, BWA-MEM (v0. 7.17) 41 was applied at default settings to align the retained high-quality reads against the hexaploid wheat reference genome assembly of Chinese Spring (RefSeq v1.0) 42 . For WGS, reads were aligned with minimap2 (v2.17) 43 , in which the genome index size was set to 50 Gb while all other parameters remained as per default. Alignments were converted into binary map format using SAMtools (v1.9) 44 and sorted afterwards with NovoSort ® (v3.06.05). Variant calling was done using the mpileup and call functions from SAMtools/ BCFtools (v1.8) 45 with parameters -DV and minimum read quality (q) cutoff = 20. Later, those biallelic variants were retained in the generated variant calling format (VCF) files using a custom awk script if the minimum QUAL = 40, minimum read depth for homozygous calls = 2 and for heterozygous calls = 4, in case of GBS, while these parameters were set to 40, 1, and 2, respectively, for WGS. From here onwards, we refer to these VCF files for GBS and WGS polymorphic variants as "unfiltered" SNP data. For the assessment of molecular diversity, linkage disequilibrium and genomic-phenotypic data interoperability (see last two sections of Methods), VCF files were further filtered using BCFtools and base and data.  (Table 1). An additional wheat genotype denoted as 'Filler' was considered due to technical reasons during sowing, but it was not part of the tested entries and has no passport data. Given the large number of entries to be screened, the plant material was tested in an unbalanced fashion by considering 1,428-1,697 entries per experiment (Table 1). Experiments were conducted between harvest years 2015-2020 at locations Gatersleben and Schackstedt. Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at both German locations. In all experiments, chemical crop protection comprised only the use of herbicides, while no fertilizers were applied. In each experiment, the experimental unit corresponded to a 0.4 m 2 plot. An alpha lattice design with two complete replications divided into incomplete blocks was used to account for uncontrolled spatial variation. Except experiment SST_2019_6, in which infection severity was scored in only one replication, both replications were measured in each experiment. In addition, GAT_2019_6 is the only experiment of this group in which YR infections were scored at two (early and late) dates. For this experiment, only the maximum as the most differentiating value among the two dates was retained for each plot. Otherwise experiments considered only a single scoring date after the clear onset of YR infections. Infection severity was expressed in a 1 (no symptoms) to 9 (severe infection) scoring scale following the official protocols of the German Federal Plant Variety Office 48 .
Precision balanced experiments. Seven experiments were conducted to test 200 European elite cultivars and 600 SSD-PGR (Table 1). Elite cultivars were pre-selected based on their maximized genomic diversity. The 600 SSD-PGR are not only a highly diverse sample but harbor also an increased proportion of resistant genotypes -which are in general at low frequencies in genebanks 14 . Among the 600 SSD-PGR, only two of them trace back to a double sampled non-homogenous PGR. Three wheat entries not belonging to the 200 cultivars plus 600 SSD-PGR but used to estimate experiment effects lack of passport data and were thus denoted as Filler_1-3. Experiments were conducted during harvest years 2019 and 2020 in German locations Gatersleben, Quedlinburg, Wetze, and Rosenthal (Table 1). Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at all considered locations. Experiments GAT_YR_2018 and -_2019 were based on natural infections, while the other five experiments were artificially inoculated. Experiments ROS_YR_2018 and -_2019 relied on inoculations directly applied on the tested material, whereas surrounding susceptible spreader plots served as initial inoculum source for the tested entries in the other three inoculated experiments. Artificial inoculations were based on spore mixtures of race isolates from genetic groups 'PstS7' and 'PstS10' collected during past crop seasons. As reported by the Global Rust Reference Center 16 , these two aggressive race groups are virulent against resistance genes Yr1, Yr2, Yr3, Yr4, Yr6, Yr7, Yr9, Yr17, Yr25, Yr32 and also against resistance specificities of genotypes 'Spalding Prolific' and ' Avocet S' . As well, both race groups are avirulent against resistance genes Yr5, Yr8, Yr10, Yr15, Yr24 and Yr27. In particular, 'PstS7' is virulent against the resistance specificity of the genotype ' Ambition' , while 'PstS10' being avirulent. Experiments conducted in Gatersleben considered chemical weed control without use of fertilizers, while fertilizers but no herbicides were applied in the QLB_YR_2019 experiment (Table 1). Regarding experiments conducted in Wetze and Rosenthal both, herbicides and fertilizers were applied. Except for the QLB_YR_2019 experiment, where plant material was tested using a single replication in incomplete blocks, all experiments considered two complete replications www.nature.com/scientificdata www.nature.com/scientificdata/ and an alpha lattice design. The size of the experimental unit was a 0.2-0.5 m 2 plot, with a size fixed for each test location. Disease symptoms were scored at a single timepoint after the onset of YR infection in WTZ_YR_2019, ROS_YR_2018 and -_2019 experiments, while early and late infections were recorded in the other four experiments. QLB_YR_2019 was the only experiment where infection was originally recorded using a percentage instead of a 1-9 scoring scale. Percentage scorings were transformed into a 1-9 scale using the scale conversion key of the German Federal Plant Variety Office 48 .
Yield evaluations for informed pre-breeding. Two groups of field experiments, summing up to 26 in total, were conducted to evaluate the contribution of PGR to yield improvement using 'Elite × PGR' crosses ( Table 2): Yield breeding value estimation experiments. A total of 751 PGR -234 PGR plus 527 SSD-PGR denoted with the suffix "_ISO" -and four elite cultivars were crossed as male parents with up to 16 out of 42 elite cultivars using chemical hybridization agents in the field. Particularly, 1,569 out of the 1,577 resulting hybrids corresponded to 'Elite × PGR' factorial crosses, while the remaining eight hybrids were 'Elite 1 × Elite 2 ' crosses. PGR serving as pollen donors comprise a diverse sample from the IPK genebank 14 and were pre-selected for their pronounced anther extrusion based on large-scale screenings of genebank material. This pre-selection ensured a sufficient quantity of field-produced hybrid seed to be able to conduct multiple field experiments. Hybrid seed of 'Elite × PGR' and 'Elite 1 × Elite 2 ' crosses was produced at the IPK facilities. Sterility of the female parents was checked by bagging at least one plant per crossing block 49 . In addition, during the season following seed production, the uniformity and hybridity -a clear morphological differentiation from the female parent -of F 1 seeds , artificial inoculations using spreader plots (Artifial "S") or full experiment artificial inoculations (Artifial "F"). e Crop management considered chemical control against weeds (H) and/or use of fertilizers (F). f Spatial variation was corrected using an alpha (α lattice) design with blocks subdivided into incomplete blocks or only considering incomplete blocks (IB). g In case of 1 (2) only one of two replicates was measured. h Number of entries according to the original field plan.
www.nature.com/scientificdata www.nature.com/scientificdata/ were controlled by growing each hybrid and its both parents side-by-side in 0.2 m 2 plots. In parallel, the 1,577 IPK hybrids were tested together with 347 hybrids from the State Plant Breeding Institute of the University of Hohenheim (Landessaatzuchtanstalt, LSA), 518 parent genotypes, in addition to a set of 40 checks for their grain yield. Yield testing was conducted in a staggered fashion throughout five consecutive years by using partially overlapping entry groups (series), each composed of 501 to 617 genotypes ( Table 2). Except for series 5, which was tested in only two locations, each series was tested in five locations. Across series, a total of 22 estimation experiments spanned together harvest years 2016-2020 and seven different German locations: Hohenheim, Renningen, Gatersleben, Schackstedt, Böhnhausen, Asendorf, and Hadmersleben. All experiments were conducted following an alpha lattice design. Experiments were performed either in an unreplicated (series 1, 2 and 4, plus three experiments of series 3), partially replicated (series 3) or completely replicated (series 5) fashion. For experiments of series 1 and 2, trialing and blocking was used to account and correct for uncontrolled spatial variation, while complete and/or incomplete blocks were considered for this purpose in series 3 to 5. The experimental unit corresponded for all series to a plot, with sizes ranging between 6 to 9 m 2 . Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at all considered locations. In all breeding value estimation experiments plots were treated with fertilizers, herbicides, and pesticides by following conventional local practices. Harvest of plots was performed with a combine harvester and plot yield was adjusted to a 140 g H 2 O kg −1 moisture basis and expressed in Mg ha −1 .
Yield breeding value validation experiments. The feasibility to develop high yielding pre-breeding material using breeding values as a tool for PGR parent selection was evaluated in early yield testing experiments (  www.nature.com/scientificdata www.nature.com/scientificdata/ crosses during 2016 and 2017, respectively. Two additional PGR lacking of breeding value estimates were also considered as male parents in crosses during 2017. A set of 12 locally adapted European elite cultivars released between years 2004 and 2015 were used as pollen receptors in two-(Elite 1 × PGR) and three-way crosses (Elite 2 × [Elite 1 × PGR]) involving PGR. Seeds of segregating progenies from each of the eight and 27 performed two-and three-way crosses, respectively, were increased and genetically fixed by two generations of selfing in Gatersleben. Besides roughing of off-types plus fixing true types, two-stage selection based on visual assessment of single plants, followed by rows focusing on plant height and leaf health in 0.5 m 2 plots, was applied. Other than herbicides, no additional chemical treatments (i.e. fungicides, nitrogen fertilizers, etc.) were used for crop management during plant material depuration. After these selection steps, at least one genotype per initial cross could enter early yield testing experiments, summing up to a total of 189 advanced F 3:4 families across 35 initial crosses. In the breeding value validation experiments ( Table 2), candidate families were evaluated for their yield per se performance together with 15 elite checks under conventional local agricultural practices. Experiments were conducted during harvest years 2020 and 2021; with each year considering two locations: Gatersleben and Schackstedt. Wheat plants were cultivated in all experiments under rainfed continental conditions predominant at both considered locations. Elite checks corresponded to winter wheat cultivars approved for commercial use in Germany, with the French cultivar ' Arezzo' (released in 2007) being the oldest one, while the German ' LGCharacter' and French 'RGTRitter' varieties (both released in 2020) were the newest ones. Check varieties 'RGTReform' , 'Benchmark' , and 'LGInformer' , were commercially released in 2014, 2015, and 2018, respectively, and connected the four validation experiments, thus allowing an integrated analysis. Seven additional genotypes (coded as LSA_1-7) present in early yield experiments were lines from the LSA breeding program. Although LSA lines lack of passport data, these were kept in datasets to not disrupt the estimation of field design effects. In all experiments, the experimental unit corresponded to a 6 m 2 plot. Entries were tested in a partially replicated fashion and an incomplete block design was used to correct for uncontrolled spatial variation. Plots were harvested using a combine harvester, whereas grain yield was adjusted to a 140 g H 2 O kg −1 moisture basis and expressed in Mg ha −1 .

Phenotypic data analyses.
A multiple-step strategy based on mixed models 50 was implemented for data curation and parameter estimation: Data curation and preparation. With the exception of yield breeding value validation experiments as well as the YR evaluations of SST_2019_6 and QLB_YR_2019 experiments, phenotypic data were outlier-corrected first by using the following general mixed model:

Trait G enotypes Experiments Genotypes Experiments Trials Replicates(Trials)
Blocks (Replicates: Trials) Error, where the common mean (μ) and genotype effects were treated as fixed factors, whereas experiments and their multiplicative interactions with genotypes, trials nested within experiments, replicates nested within trials, incomplete blocks nested within replicates and trials, as well as the model error nested within experiments were assumed as random and normally distributed.
In case of YR evaluations, Eq. (1) was fitted experiment-wise for each scoring timepoint (single, early or late). Therefore, effects of trials, experiments, and their interactions with genotypes were dropped from Eq. (1) according to each specific experimental design (Table 1). Normalized residuals of this model were obtained by subtracting their average and dividing them by their standard deviation. After this, residuals were tested experiment-wise for their significance as outliers following Anscombe and Tukey 51 and assuming a nominal alpha level of 0.01. Accordingly, datapoints flagged as outliers were discarded from final datasets.
For breeding value estimation experiments, outlier correction of yield data underwent series-wise and trials and/or replicate effects in Eq. (1) were considered/ignored according to the respective experimental design(s) used in each series (Table 2). Later, yield data were screened series-wise for significant outliers using the method M4 "Bonferroni-Holm with rescaled median absolute deviation standardized residuals" as suggested by Bernal-Vasquez et al. 52 . Following this, datapoints detected as significant outliers were accordingly discarded. Afterwards, yield records of series 1 to 4 were adjusted series-wise for trials, replicates and/or effects of incomplete blocks using Eq. (1) according to the experimental design(s) specific for each series ( Table 2) while this adjustment underwent experiment-wise for series 5. In a next step, 161 hybrids (144 IPK plus 17 LSA hybrids) with low homogeneity and/or hybridity were discarded from the integrated dataset and IPK hybrids plus line parent genotypes were subtracted for parameter estimation.
Parameter estimations within experiments. Following data preparation, parameter estimation underwent first experiment-wise for YR-scores and yield breeding value validation experiments. Best linear unbiased estimations (BLUEs) of genotypes for YR-scores were computed for each scoring timepoint ("single", "early" or "late") of replicated experiments as well as for yield performance in each yield breeding value validation experiment. For this, effects of trials, experiments, and their interactions with genotypes were ignored in Eq. (1) and design effects were considered/skipped according to each specific experiment (Tables 1 and 2). Due to the absence of replications in QLB_YR_2019 and SST_2019_6, YR-scores in these particular experiments were adjusted out of the frame of mixed models using the means of corresponding incomplete blocks. In parallel, variance components of single replicated experiments were estimated for each scoring timepoint in a similar fashion as BLUEs but assuming genotypes as random. Variance estimates were used to compute experiment-specific heritabilities in the way: where g 2 σ and σ Error 2 are the genotypic and error variance estimates, respectively, while N Rep .
. is the average number of effective replicates after considering missing plots and/or outlier-correction.
Parameter estimations across experiments. Large-scale YR screening experiments: firstly, a correlation test for BLUEs and/or experimental design corrected data was performed between experiments. Later, BLUEs of genotypes and variance components of YR-scores were obtained from the outlier-corrected data integrated across 12 experiments. For this, the trial effect was dropped from Eq. (1). The heritability across experiments was then computed as: Precision balanced YR experiments: data of one genotype ('PilgrimPZO') was discarded from these integrated analyses due to material mislabeling. In a first step, correlations of BLUEs and/or experimental design corrected data were computed between experiments. In addition, the maximum value among early and late scorings or single timepoint scoring were selected experiment-wise for each genotype based on single experiment BLUEs or data corrected for incomplete-block effects in the case of QLB_YR_2019. Using this integrated dataset BLUEs were computed across experiments by fitting Eq. (1) but only considering μ as well as genotype, experiment and error effects. By assuming μ as fixed factor and the remaining model effects as random, g 2 σ was obtained but the error term and genotype × experiment interaction were confounded in this model. Assuming that the average of single-experiment error variance estimates equals Error is the variance estimate of the confounded error and interaction terms of the model. After this, Eq. (3) was used to estimate the heritability of YR-scores across precision experiments.
Yield breeding value estimation experiments: in a first step, correlations of BLUEs and/or experimental design corrected data were computed between experiments. Later, the following mixed model was fitted to the outlier-and-design corrected yield data from 22 estimation experiments:

Yield Type Experiments Lines Hybrids Lines Experiments
Hybrids Experiments Error, where Type includes the specific group mean of either lines (μ Lines ) or hybrids (μ Hybrids ) and was assumed as fixed, while hybrid and lines, experiments and their interactions with genotypes as well as the error nested within experiments were assumed as random factors. For lines, variance estimates g h , www.nature.com/scientificdata www.nature.com/scientificdata/ Yield breeding value validation experiments: BLUEs of genotypes and variance components of the yield performance across breeding value validation experiments were obtained using Eq. (1) but skipping replicate and trial effects and assuming a common error variance for all experiments. The heritability of yield performance across experiments was then computed as specified in Eq. (3).
Linear mixed models of phenotypic data analyses were fitted using the average information matrix algorithm for restricted maximum likelihood (REML) computation implemented in ASReml-R (v3.0 and 4.0) 53,54 .

Molecular diversity and linkage disequilibrium as captured by genotyping platforms.
Considering the total number of genotypes for each genotyping platform (GBS and WGS) as n and the total number of filtered markers as p, SNP-matrices can be represented as M = (m si ), with 1 ≤ i ≤ n and 1 ≤ s ≤ p. Given 1 ≤ j ≤ n, the Rogers' distance 55 between any i-th and j-th genotypes is calculated in the way: . For each row of M, homozygous states for reference and alternative alleles were coded as 0 and 2 according to RefSeq v1.0, respectively, while 1 represented the heterozygous state. A principal coordinate analysis (PCoA) was conducted on Rogers' distance matrices using the cmdscale() function in R. Here, the first two PCo, i.e. PCo1 and 2, were retained to respectively portray the molecular diversity captured by GBS and WGS using biplots. To investigate the level of concordance between GBS and WGS, a Mantel correlation test 56 as implemented in the vegan R package (v2.5-7) 57 was performed on the Rogers' distance matrices for 454 overlapping genotypes between both platforms. SNP filtering resulted in 29,846 GBS and 1,452,806 WGS markers having a minimum QUAL score of 40, a maximum missing value rate of 10%, ≥10 genotypes carrying any of both alleles in homozygous state, and up to 1% heterozygosity. This implied 24,091,446 and more than 67 billion intra-chromosomal marker combinations to be assessed for GBS and WGS platforms, respectively. To reduce the computational burden for WGS SNP markers, variants were chromosome-wise downsampled to an expected value of 10 markers per Mb, resulting in 145,307 markers across 21 chromosomes and the unanchored sequences. Intra-chromosomal linkage disequilibrium between marker (column) pairs of t(M) was estimated as the squared correlation coefficient (r 2 ) 58 , while physical distances were computed as the pairwise Euclidean distance between SNP marker positions on RefSeq v1.0 of Chinese Spring 41 . Efficient correlation computation was performed using the bigcor() function implemented in the propagate R package (v1.0-6) 59 . After this, cubic splines were fitted in R environment using smooth.spline() to estimate the r 2 decay as a function of the distances between marker pairs in different genetic pools: PGR samples, European elite cultivars and German elite breeding lines.
Genomic-phenotypic data interoperability. The accuracy of the genomic best linear unbiased prediction (GBLUP) 60 was used as a quality measure for data interoperability of overlapping phenotypic and genomic datasets. Using matrix nomenclature, the mixed model 50 for GBLUP can be described as follows: σ correspond to the genotypic and error variance components of the model, respectively. For G matrix computation, profiles in M were coded as −1, 1, 0, for the reference and alternative alleles at homozygous and heterozygous states, respectively, while missing values were imputed using the average of observed values for each particular locus. Prediction accuracies of GBLUP were estimated by means of five-fold cross validations. For this, datasets containing both phenotypic and genomic data were randomly subdivided into five groups. The first four groups served together as the training set, whereas the fifth group corresponded to the prediction set. During prediction, the phenotypes of the prediction set were masked, while monomorphic markers across training and predictions sets were discarded for G matrix computation. After prediction, the accuracy was computed for genotypes in the prediction set as the Pearson correlation coefficient between predicted and observed values divided by h 2 , with h 2 being the heritability of the corresponding predicted trait. The five groups were permuted, so that each of them serves exactly four times as training set, and one time as prediction set. The random subdivision into five groups was repeated 20 times, giving a total of 5 × 20 = 100 cross-validation runs.

Data Records
Raw sequencing reads: FASTQ files containing raw reads for 8,070 (GBS) and 768 (WGS) genotypes were deposited at the European Nucleotide Archive 62 : GBS 63 and WGS 64,65 . Sequenced genotypes are findable through their 'SAMEA' IDs on EMBL-EBI BioSamples 66 : a full list of integrated GBS and WGS 'SAMEA' BioSample IDs connected with plant material passports, passport data sources, SSD-and IPK genebank DOIs was deposited at e!DAL-PGP 67 and can be accessed here 68 .
SNP markers: variant calling results based on read mapping against the reference sequence of Chinese Spring (RefSeq v1.0) 42 were stored as VCF files. Unfiltered VCF files for GBS ('090222_8070_sample_unfiltered_snps_ biosample.vcf.gz') and WGS ('070222_768_samples_wgs_no_filter_biosample.vcf.gz') data are located at the (2022) 9:784 | https://doi.org/10.1038/s41597-022-01891-5 www.nature.com/scientificdata www.nature.com/scientificdata/ European Nucleotide Archive and can be accessed here 69 . These files contain information on 1,628,276 and 213,804,916 SNP markers with a minimum QUAL = 40 and polymorphic among 8,070 GBS and 768 WGS samples, respectively. Genotype names are coded using the respective 'SAMEA' BioSample IDs 68 . VCF files contain relevant information for each SNP regarding its chromosome, physical position on chromosome in bp, reference and alternative alleles, as well as QUAL. R objects containing reduced numbers of SNP markers used for technical validations were deposited into the e!DAL-PGP repository and can be accessed here 70 .
Phenotypic records were also deposited into e!DAL-PGP: YR-scorings: infection severity of Puccinia striiformis f. sp. tritici on wheat plants were field recorded on plots and expressed in a 1 (no symptoms) to 9 (severe infection) scoring scale following the official protocols of the German Federal Plant Variety Office 48 . Text files containing YR-scores were stored in ISA-Tab format. After outlier correction, the effective number of records from large-scale screening 71 and precision experiments 72 amount to 35,043 and 15,353, respectively. Besides phenotypic records, each ISA-Tab file includes information that connect records with the corresponding plant material, incomplete block, replication, year, location, and experiment, in addition to plant material passports, passport data sources, SSD-and IPK genebank DOIs, as well as 'SAMEA' IDs. Ready-to-use BLUEs computed across large-scale screening experiments 71 for 7,682 PGR and 80 elite cultivars as well as BLUEs across precision experiments 72 for 199 elite cultivars and 600 SSD-PGR are available in the'BLUEs_and_heritabilities' folder associated to each corresponding dataset.
Yield records: wheat grain yield was field recorded on plots and expressed in Mg ha −1 on a 140 g H 2 O kg −1 moisture basis. Text files containing grain yield records were stored in ISA-Tab format. After correction for outliers and design effects, the effective number of phenotypic records in breeding value estimation experiments amounts to 7,407 73 . The ISA-Tab file contains also information to connect phenotypic records with the corresponding tested material, material type (hybrid or line), parents, year, location, experiment, and series, in addition to plant material passports of parent lines, their passport data sources, SSD-and IPK genebank DOIs, as well as 'SAMEA' IDs. Ready-to-use breeding values computed across estimation experiments for 707 PGR can be found in the respective 'Breeding_values_and_heritabilities' folder 73 . For breeding value validation experiments, the number of yield records corresponds to 739 74 . The corresponding ISA-Tab file includes also the information needed to connect phenotypic records to the respective plant material, FAMILY-DOI, incomplete block, year, location, and experiment, in addition to plant material passports of check cultivars and parent lines, their passport data sources, SSD-and IPK genebank DOIs, as well as 'SAMEA' IDs. Ready-to-use BLUEs of grain yield computed across validation experiments are also available for 189 advanced F 3:4 progenies and 15 elite cultivar checks in the corresponding 'BLUEs_and_heritabilities' folder 74 .
For more details on genomic and phenotypic data production, preparation, and processing, please refer to the Methods section. Machine readable details are also included in a 'i_investigation.txt' file associated to each phenotypic dataset [71][72][73][74] .

technical Validation
Genotyping-by-sequencing is a precise and cost-efficient platform to study molecular diversity in genebanks. For GBS, the average total read count per genotype (after trimming) corresponded to ~2.64 million, while a WGS sample presented on average ~354.8 million reads (Table 3). These numbers slightly decreased to ~2.59 (98.3% of total) and ~349.1(98.4%) after read mapping against Chinese Spring (RefSeq v1.0) 42 , with 62.7% (GBS) and 47.7% (WGS) of the average read count per genotype having a mapping quality >q20. These reads (>q20) were retained during variant calling and allowed the obtention of VCF files for 8,070 GBS and 768 WGS samples, respectively. SNP markers with QUAL ≥40, ≤10% missing values rate, at least 10 genotypes carrying each allele in homozygous state, and a maximum of 1% heterozygosity, were used to assess the molecular  www.nature.com/scientificdata www.nature.com/scientificdata/ diversity, linkage disequilibrium and genomic-phenotypic data interoperability. The latter can be found after quality assessment sections of phenotypic data. After filtering a total of 29,846 SNP markers across 8,070 genotypes were available for GBS, while the SNP-matrix for WGS contained 1,452,806 markers across 768 genotypes 70 .
A PCoA was conducted on the Rogers' distances to assess the molecular diversity among the 8,070 GBS and the 768 WGS samples (Fig. 3a,b). Rogers' distance matrices and a custom R code for their computation are also available here 70 . PCo1 and 2 explained 9.5% and 5.6% of the molecular variation portrayed by GBS-SNP markers, respectively, while the reduced number of samples for WGS slightly increased the percentage of explained variation to 13.1% (PCo1) and 7.1% (PCo2). As expected, PGR samples expand the molecular diversity of the elite pool, whose genotypes cluster very close to each other in the left corner of biplots. Coordinates of the German elite breeding lines formed a slightly more contracted group than elite cultivars (Fig. 3b). Nevertheless, the good overlap between these two latter groups reflects the continuous material exchange that takes place between European wheat breeders 75,76 .
The intra-chromosomal decay of linkage disequilibrium (r 2 ) as a function of physical distance was estimated by fitting cubic splines curves for each genotypic group and genotyping platform (Fig. 3c,d). Independent of the genotyping platform and as already observed in past studies 77,78 , r 2 values decay faster in genebank samples as compared to modern elite genotypes. European elite cultivars presented in turn a slightly faster decay of linkage disequilibrium compared to the German elite breeding lines (Fig. 3d). Since differences in r 2 decay between GBS (Fig. 3c) and WGS (Fig. 3d) platforms were less pronounced for elite cultivars, we attribute the faster r 2 decay in PGR samples portrayed by GBS to the large difference in population size (7,745 GBS vs 446 WGS samples) rather than to the genotyping platforms themselves.
As also reported in our companion study 14 , the correlation between GBS-and WGS-based Rogers' distances among 454 genotypes characterized with both platforms amounted to 0.88 (Mantel correlation test p-value = 0.001). According to correlation estimates from past works comparing GBS with other mainstream genotyping platforms to assess crop plant genebank diversity 7,79 , the correspondence between GBS and WGS observed www.nature.com/scientificdata www.nature.com/scientificdata/ in our study is very high. This is noteworthy, considering that WGS reads deliver the least biased genome representation possible. GBS is a cost-effective and simple method that reduces the complexity of genomes. On the one hand, complexity reduction obviously limits the depth of analysis for large and complex genomes of species such as hexaploid wheat. On the other hand, if the primary objective of characterizations is the macro-assessment of molecular diversity and linkage-disequilibrium in wheat populations, GBS is the method of choice 7,79 . Large-scale screening and precision experiments revealed yellow rust resistance as a rare phenotype in the German Federal ex situ genebank. After outlier correction, heritability estimates for replicated experiments were in general higher than 0.7, with the only exception being SST_2018_5 (h 2 = 0.54) ( Table 4). The highest heritabilities (h 2 = 0.92) were estimated in inoculated precision experiments ROS_ and WTZ_YR_2019. Due to material mislabeling during the prosecution of experiments, the cultivar 'PilgrimPZO' was completely discarded from further phenotypic analyses. The effective number of entries with YR scores (either BLUEs or single point values) ranged from 1,395 to 1,669 per individual screening experiment and between 722 and 797 for precision experiments. Because of the unbalanced structure of large-scale screening experiments, the pairwise entry overlap ranged between four and 1,641 common genotypes. In contrast, these numbers were higher in balanced precision experiments, ranging from 697 to 797 common entries between experiments. In our companion study 14 , PGR tested in the first five large-scale screening experiments constituted the base population for trait-customized core selection of the 600 SSD-PGR tested in precision experiments. For this reason, the pairwise entry overlap between large-scale screenings and precision experiments drops drastically from 259-339 to 13-26 in later experiments. All significant pairwise correlations (p-value < 0.05) between design-corrected YR scores from different experiments were positive and ranged between 0.29 and 0.92. In analyses across experiments, the heritability of YR-scores was 0.82 for the large-scale screening and 0.89 for precision experiments. Presumably due to unbalanced phenotyping, a very small proportion (0.4%) of BLUEs computed across large-scale screening experiments for 7,682 PGR plus 80 elite cultivars lied outside of the 1-9 parametric space (Fig. 4). This bias was not observed in the BLUEs computed across precision experiments for 199 elite cultivars and 600 SSD-PGR. Nevertheless, such a bias is ignorable considering the strong correlation [r = 0.77, -log 10 (p-value) = 128.4] of BLUEs across experiments for the overlapping material between precision and large-scale screening experiments. Large-scale screening experiments revealed that only a small PGR fraction (8.4%) have fewer infection symptoms than an average elite cultivar. We showed in our companion study 14 that this resistant PGR fraction is enriched with material from European origins that entered the IPK genebank  www.nature.com/scientificdata www.nature.com/scientificdata/ during recent decades. The implemented trait-customized core selection approach 14 allowed to more than triple (27.5%) the PGR YR-resistant proportion in precision experiments while increasing exotic molecular diversity and reducing the association between population structure and trait variation. This provided the base to identify genetically diverse PGR donors of YR resistance sources not yet used in elite breeding 14     www.nature.com/scientificdata www.nature.com/scientificdata/ validated using classical and functional genetics approaches. Heritabilities, BLUEs and custom R codes for their computation are also available in the respective 'BLUEs_and_heritabilities' folders 71,72 . Yield breeding values of plant genetic resources to inform breeders and initiate pre-breeding programs. After outlier-and-design correction, the effective number of entries with yield records ranged between 238 and 500 for individual breeding value estimation experiments. Due to the series-wise strategy to test plant material, the highest numbers of overlapping entries with yield records were observed between experiments conducted within the same year (Table 5). In more detail, the number of pairwise overlapping entries among 22 breeding value estimation experiments ranged between 0 and 500. All significant pairwise correlations (p-value < 0.05) between yield records of different experiments had positive sign, with magnitudes ranging between 0.11 and 0.96. Across experiments, 37 elite cultivars in addition to 227 PGR plus 1,429 'Elite × PGR' and four 'Elite 1 × Elite 2 ' F 1 hybrids have yield records in the outlier-and-design corrected dataset. Heritabilities of per se yield performance amounted to 0.89 for elite cultivars and PGR together, while the heritability of hybrid performance was 0.50, as also reported in our companion work 14 . The 1,429 'Elite × PGR' originate from crossing 36 elite cultivars with 205 PGR and 510 SSD-PGR which, put together, trace back to 707 PGR tested in 1,427 merged 'Elite × PGR' hybrid crosses. Yield breeding values of PGR computed across the 22 estimation experiments ranged between 6.11 and 7.11 Mg/ha, with a mean of 6.79 (Fig. 5a). As reported in our companion work 14 , the heritability of breeding values was 0.32, which reflects the complexity of handling less than half of the genetic variation underlying yield in hybrids. A custom R code for breeding value estimation of PGR and heritability computations as well as its expected outputs are available in the 'Breeding_values_and_heritabilities' folder 73 .
Based on preliminary results from estimation experiments conducted during harvest year 2016, 13 PGR were selected for their superior breeding value to initiate a small pre-breeding program using 11 European elite cultivars as breeding value receptors (Fig. 5). Computed across the 22 estimation experiments, breeding values of selected PGR ranged from 6.80 to 7.05 Mg/ha and were superior to the general mean of breeding value estimates (Fig. 5a). Progenies from seven and 25 Elite 1 × PGR and Elite 2 × [Elite 1 × PGR] initially performed crosses, respectively, were advanced to F 3:4 and pre-selected for good visual performance. The in total 173 advanced F 3:4 progenies, with at least one progeny per initial cross, were tested for grain yield together with 15 elite cultivar checks and 16 additional pre-breeding lines in four validation experiments conducted in two locations during harvest years 2020 and 2021 (Fig. 5b). A custom R code for BLUEs and heritability computation as well as its expected output files are available in the 'BLUEs_and_heritabilities' folder 74 . Yield performances were highly repeatable, which was reflected by the heritabilities within (h 2 ≥ 0.76, Table 6) and across (h 2 = 0.76) validation experiments as well as by the significant positive correlations among them (r ≥ 0.68, Table 6). The highest yields were observed in general for elite check cultivars, with grain yield values ranging between 5.94 and 7.48 Mg/ha (Fig. 5b). For them, the year of cultivar release was significantly and positively correlated with the yield performance (r = 0.79, p-value < 0.001), reflecting the advances in yield breeding achieved between years 2007 and 2020. Among Fig. 6 Distributions of cross-validated interoperability between genomic and phenotypic data. Genotyping platforms were genotyping-by-sequencing (GBS) and whole-genome sequencing (WGS, 3-fold coverage), while phenotypes corresponded to the best linear unbiased estimates for yellow rust (YR) scorings computed across large-scale screening or precision experiments, as well as yield breeding values (BV) computed across estimation experiments. Interoperability was estimated as the genomic prediction accuracy using 80% of the integrated data as training and 20% as validation set in 100 cross-validation runs. Total number (N) of samples with phenotypes and polymorphic SNP markers used for cross-validations according to each genotyping platform are portrayed as table on the left side. In case of GBS*, the same training and validation phenotypes used for WGS were considered. For more details on cross-validations, please see Methods. In distributions, diamonds, horizontal and vertical lines correspond to the average, standard deviation and median, respectively. Violin plots were obtained using the vioplot R package (v0.3.7). www.nature.com/scientificdata www.nature.com/scientificdata/ increased in general the number of progenies per initial cross with good visual performance to an average of 6.7, with the least and most prolific crosses having one and 21 F 3:4 progenies, respectively. Two-way crosses were as a group also significantly less competitive than three-way crosses ( Three way Two way μ μ − − − = 0.72 Mg/ha, p-value < 0.001). The lower outputs for two-way crosses could be attributed to the increased proportion of deleterious PGR background still present in them, which also indicates that a realistic use of PGR variation for pre-breeding is achieved through three-way crosses. In fact, three-way crossing schemes are already the main strategy to introduce PGR variation into large-scale global pre-breeding programs such as Seeds of Discovery 22 (https://seedsofdiscovery.org/). Although none of the PGR-derived progenies was as competitive as the best check cultivar 'Informer' (released in 2018), among the 173 F 3:4 progenies whose PGR parents were selected based on breeding values, three (2%) and 15 (9%) had better yield performance than the best newest ('LGCharacter' , released in 2020) and the locally most grown cultivar ('RGTReform' , released in 2014), respectively. Regarding the additional 16 pre-breeding lines whose PGR parents lack of breeding value estimates, none of them could reach these previously mentioned yield levels. Global efforts of the Seeds of Discovery initiative led to the development of 2,867 pre-breeding lines that trace back to 366 exotic wheat PGR 22 . Multiple environment yield trials conducted in Central America and South Asia revealed that locally, up to ~2% of these pre-breeding lines have better yield performance than the best adapted varieties. Moreover, pre-breeding lines with beneficial traits trace back to 62 of the 366 exotic founders (17%) used as PGR parents by Seeds of Discovery 22 . In our small pre-breeding program established using breeding values as parent selection tool for PGR (Fig. 5), competitive pre-breeding lines trace back to eight of the 13 selected PGR (62%) parents. All in all, at least as regards visual performance and yield, our validation experiments suggest that using breeding values as tool could boost the input-to-output ratio for pre-breeding programs.
High genomic prediction accuracies support the suitability of data for integrated phenotypic-genomic analyses. Seed mixtures, sample mislabeling, among other sources of systematic errors can occur in large-scale characterizations. This obviously disrupts the connectivity between genotype and phenotype and in turn, decrease the value of the data for integrated analyses. To rule-out the presence of such data-imparity, we used the cross-validated accuracy of genomic prediction as a quality measure for genomic-phenotypic data interoperability (Fig. 6).
Integrating YR-scorings and GBS data resulted in 6,371 and 799 genotyped samples having records in large-scale screenings and precision experiments, respectively (Fig. 6). In cross-validations, the genomic-phenotypic data interoperability between YR-scorings and GBS was in general high, with prediction accuracies of 0.72 ± 0.02 and 0.69 ± 0.05 for large-scale screening and precision experiments, respectively. As expected from past simulation studies 80,81 , the slightly higher accuracy for large-scale screenings than for precision experiments can be attributed to the ~8 times larger population size of the former group, which provided in turn also a small increase (~6%) in the number of polymorphic markers used for prediction. Shifting to WGS decreased population sizes for large-scale screenings and precision experiments to 309 and 454 samples but accuracies only slightly dropped to 0.69 ± 0.09 and 0.66 ± 0.08, respectively. Moreover, the ~55-fold increase in number of polymorphic markers from WGS provided practically no improvement in accuracy over GBS when the same population size was considered for both genotyping platforms (see GBS* in Fig. 6). This last observation is not surprising considering that GBLUP 60 mostly relies on relatedness for prediction 80,81 and that, as mentioned before, GBS-and WGS-based Rogers' distances were highly correlated. Although out of the scope of this work, we anticipate that alternative genomic prediction methods less dependent on relatedness 80,81 would benefit more from the increased marker densities provided by WGS.
GBS samples of PGR having also breeding value estimates amount to 597 (Fig. 6). Prediction accuracies of 0.68 ± 0.1 support also the high genomic-phenotypic data interoperability for these samples and come very close to the estimates presented in our companion study using 1,000 instead of 100 cross-validations 14 . Last but not least, only 24 PGR samples have both WGS data and breeding value estimates -a too limited number of genotypes to meaningfully assess the genomic-phenotypic data interoperability. Although integrated analyses for WGS and yield breeding values are currently not advisable, PGR are available upon request, and we thus fully encourage future activities that increase the connectivity between these two types of data. A custom R code to assess the genomic-phenotypic data interoperability as well as its needed inputs and expected outputs are available here 70 .

Usage Notes
We expect that these FAIR data support and encourage future research and breeding initiatives that further valorize crop plant genebanks. The genebank material of the 'TRI' catalogue is available upon request using IPK genebank DOIs and can be accessed through GBIS (gbis.ipk-gatersleben.de) under the conditions of a standard material transfer agreement (SMTA). Seeds of field isolated accessions and pre-breeding material can be requested upon availability through their SSD-and FAMILY-DOIs, respectively, by following SMTA conditions as well (contact e-mail: reif@ipk-gatersleben.de).

Code availability
The custom awk script for filtering of VCF files is available at e!DAL-PGP and can be accessed here 82 . Custom R codes for phenotypic parameter estimations are included within the respective 'R_code' subfolder of each dataset [71][72][73][74] deposited into e!DAL-PGP. In addition, custom R codes to assess the genomic-phenotypic data interoperability and the computation of Roger's distances as well as their needed inputs and expected outputs were also deposited into e!DAL-PGP and can be accessed here 70 . (2022) 9:784 | https://doi.org/10.1038/s41597-022-01891-5 www.nature.com/scientificdata www.nature.com/scientificdata/